Wildlife Movement Prediction using Deep Learning 🦅¶

This project leverages deep learning techniques to analyze GPS tracking data of wildlife, focusing on bird migration patterns. Using geospatial and temporal data, we preprocess and visualize trajectories, compute inter-location distances using the Haversine formula, and apply sequential modeling techniques (e.g., GRU) to predict animal movement.


📂 Dataset Information¶

  • Source: Movement Ecology Dataset (hosted via Google Drive)
  • Size: ~90,000 entries
  • Fields: Timestamp, Location (lat/lon), Species, Sensor Type, Vegetation Indexes, etc.

💻 Project Goals¶

  • Clean and preprocess geospatial data
  • Compute movement trajectories
  • Build and train GRU-based deep learning model
  • Predict next location(s) in movement sequence
In [1]:
%pip install gdown geopy pandas numpy matplotlib seaborn scikit-learn tensorflow --quiet
Note: you may need to restart the kernel to use updated packages.
In [2]:
import numpy as np
import pandas as pd
import math
In [3]:
import gdown
import pandas as pd

# Extract the File ID from your link
file_id = "1o1umq9xOuvhE7rKWpop82tvkKYjtIPyX"  # Extracted from your Google Drive link

# Correct Google Drive direct download URL
download_url = f"https://drive.google.com/uc?id={file_id}"

# Define output file name
output_file = "migration_original.csv"

# Download the file
gdown.download(download_url, output_file, quiet=False)
/Users/apple/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Downloading...
From: https://drive.google.com/uc?id=1o1umq9xOuvhE7rKWpop82tvkKYjtIPyX
To: /Users/apple/Desktop/Wildlife_Movement_Prediction/migration_original.csv
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 22.3M/22.3M [02:38<00:00, 141kB/s]
Out[3]:
'migration_original.csv'
In [4]:
# Load the dataset
df = pd.read_csv('migration_original.csv')
print(df.shape)
df.head()
(89867, 15)
Out[4]:
event-id visible timestamp location-long location-lat manually-marked-outlier visible.1 sensor-type individual-taxon-canonical-name tag-local-identifier individual-local-identifier study-name ECMWF Interim Full Daily Invariant Low Vegetation Cover NCEP NARR SFC Vegetation at Surface ECMWF Interim Full Daily Invariant High Vegetation Cover
0 1082620685 True 2009-05-27 14:00:00.000 24.58617 61.24783 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.039229 NaN 0.960771
1 1082620686 True 2009-05-27 20:00:00.000 24.58217 61.23267 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.040803 NaN 0.959197
2 1082620687 True 2009-05-28 05:00:00.000 24.53133 61.18833 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.052201 NaN 0.947799
3 1082620688 True 2009-05-28 08:00:00.000 24.58200 61.23283 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.040818 NaN 0.959182
4 1082620689 True 2009-05-28 14:00:00.000 24.58250 61.23267 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.040753 NaN 0.959247
In [5]:
# Check for unique values in all the columns
for column in df.columns:
  print(f'The Unique Columns present in "{column}" are: ',df[column].unique(), "\n")
The Unique Columns present in "event-id" are:  [1082620685 1082620686 1082620687 ... 1082710937 1082710938 1082710939] 

The Unique Columns present in "visible" are:  [ True] 

The Unique Columns present in "timestamp" are:  ['2009-05-27 14:00:00.000' '2009-05-27 20:00:00.000'
 '2009-05-28 05:00:00.000' ... '2015-08-26 21:00:00.000'
 '2015-08-27 06:00:00.000' '2015-08-27 09:00:00.000'] 

The Unique Columns present in "location-long" are:  [24.58617 24.58217 24.53133 ... 35.69217 35.71483 35.66567] 

The Unique Columns present in "location-lat" are:  [61.24783 61.23267 61.18833 ... 64.95367 64.97133 65.019  ] 

The Unique Columns present in "manually-marked-outlier" are:  [nan] 

The Unique Columns present in "visible.1" are:  [ True] 

The Unique Columns present in "sensor-type" are:  ['gps'] 

The Unique Columns present in "individual-taxon-canonical-name" are:  ['Larus fuscus'] 

The Unique Columns present in "tag-local-identifier" are:  [91732 91733 91734 91735 91737 91738 91739 91740 91741 91742 91743 91744
 91745 91746 91747 91748 91749 91750 91751 91752 91754 91755 91756 91758
 91759 91761 91762 91763 91764 91765 91766 91767 91769 91771 91774 91775
 91776 91777 91778 91779 91780 91781 91782 91783 91785 91786 91787 91788
 91789 91794 91795 91797 91798 91799 91800 91802 91803 91807 91809 91810
 91811 91812 91813 91814 91815 91816 91819 91821 91823 91824 91825 91826
 91827 91828 91829 91830 91831 91832 91835 91836 91837 91838 91839 91843
 91845 91846 91848 91849 91852 91854 91858 91861 91862 91864 91865 91866
 91870 91871 91872 91875 91876 91877 91878 91880 91881 91884 91885 91893
 91894 91897 91900 91901 91903 91907 91908 91910 91911 91913 91916 91918
 91919 91920 91921 91924 91929 91930] 

The Unique Columns present in "individual-local-identifier" are:  ['91732A' '91733A' '91734A' '91735A' '91737A' '91738A' '91739A' '91740A'
 '91741A' '91742A' '91743A' '91744A' '91745A' '91746A' '91747A' '91748A'
 '91749A' '91750A' '91751A' '91752A' '91754A' '91755A' '91756A' '91758A'
 '91759A' '91761A' '91762A' '91763A' '91764A' '91765A' '91766A' '91767A'
 '91769A' '91771A' '91774A' '91775A' '91776A' '91777A' '91778A' '91779A'
 '91780A' '91781A' '91782A' '91783A' '91785A' '91786A' '91787A' '91788A'
 '91789A' '91794A' '91795A' '91797A' '91798A' '91799A' '91800A' '91802A'
 '91803A' '91807A' '91809A' '91810A' '91811A' '91812A' '91813A' '91814A'
 '91815A' '91816A' '91819A' '91821A' '91823A' '91824A' '91825A' '91826A'
 '91827A' '91828A' '91829A' '91830A' '91831A' '91832A' '91835A' '91836A'
 '91837A' '91838A' '91839A' '91843A' '91845A' '91846A' '91848A' '91849A'
 '91852A' '91854A' '91858A' '91861A' '91862A' '91864A' '91865A' '91866A'
 '91870A' '91871A' '91872A' '91875A' '91876A' '91877A' '91878A' '91880A'
 '91881A' '91884A' '91885A' '91893A' '91894A' '91897A' '91900A' '91901A'
 '91903A' '91907A' '91908A' '91910A' '91911A' '91913A' '91916A' '91918A'
 '91919A' '91920A' '91921A' '91924A' '91929A' '91930A'] 

The Unique Columns present in "study-name" are:  ['Navigation experiments in lesser black-backed gulls (data from Wikelski et al. 2015)'] 

The Unique Columns present in "ECMWF Interim Full Daily Invariant Low Vegetation Cover" are:  [0.03922896 0.0408028  0.0522006  ... 0.82435717 0.82432803 0.82430923] 

The Unique Columns present in "NCEP NARR SFC Vegetation at Surface" are:  [nan] 

The Unique Columns present in "ECMWF Interim Full Daily Invariant High Vegetation Cover" are:  [0.96077104 0.9591972  0.9477994  ... 0.17564283 0.17567197 0.17569077] 

In [6]:
'''
    1. As some Columns contain all null values or a single value for entire dataset,
       they does not contribute to the output at all thus we will drop them.

    2. Also as "individual-local-identifier" is the same as that of "tag-local-identifier"
       just with an extension of "A" they become similar.

    3. Again, "ECMWF Interim Full Daily Invariant Low Vegetation Cover" and "ECMWF Interim Full Daily Invariant High Vegetation Cover"
       are complementary to each other. Thus, do not need to keep both in our dataset for training. Any one can be dropped.
'''

# define Columns to drop
columns_to_drop = ["event-id","visible", "visible.1", "sensor-type", "individual-taxon-canonical-name", "study-name", "manually-marked-outlier",
                   "NCEP NARR SFC Vegetation at Surface", "individual-local-identifier", "ECMWF Interim Full Daily Invariant Low Vegetation Cover"]

# drop unwanted columns
data = df.drop(columns=columns_to_drop)
data.head()
Out[6]:
timestamp location-long location-lat tag-local-identifier ECMWF Interim Full Daily Invariant High Vegetation Cover
0 2009-05-27 14:00:00.000 24.58617 61.24783 91732 0.960771
1 2009-05-27 20:00:00.000 24.58217 61.23267 91732 0.959197
2 2009-05-28 05:00:00.000 24.53133 61.18833 91732 0.947799
3 2009-05-28 08:00:00.000 24.58200 61.23283 91732 0.959182
4 2009-05-28 14:00:00.000 24.58250 61.23267 91732 0.959247
In [7]:
# Check for data types and Null counts using info() method
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89867 entries, 0 to 89866
Data columns (total 5 columns):
 #   Column                                                    Non-Null Count  Dtype  
---  ------                                                    --------------  -----  
 0   timestamp                                                 89867 non-null  object 
 1   location-long                                             89867 non-null  float64
 2   location-lat                                              89867 non-null  float64
 3   tag-local-identifier                                      89867 non-null  int64  
 4   ECMWF Interim Full Daily Invariant High Vegetation Cover  89867 non-null  float64
dtypes: float64(3), int64(1), object(1)
memory usage: 3.4+ MB
In [8]:
# -------------------- STEP 1: Load Data -------------------- #
# Ensure timestamps are in datetime format
data["timestamp"] = pd.to_datetime(data["timestamp"])
In [9]:
# -------------------- STEP 2: Group Data by Tag -------------------- #
# This groups the dataset by "tag-local-identifier" so that birds are clearly separated
data = data.sort_values(by=["tag-local-identifier", "timestamp"]).reset_index(drop=True)
In [10]:
# -------------------- STEP 3: Extract Date-Time Features -------------------- #
data["year"] = data["timestamp"].dt.year
data["month"] = data["timestamp"].dt.month
data["hour"] = data["timestamp"].dt.hour
In [11]:
# STEP 4: Compute Time Difference per Bird
data["time_diff(hrs)"] = (
    data.groupby("tag-local-identifier")["timestamp"]
    .diff().dt.total_seconds() / 3600
)

# Replace NaN with 0 for the first row per bird (safe assignment)
data["time_diff(hrs)"] = data["time_diff(hrs)"].fillna(0)
In [12]:
data.head(10)
Out[12]:
timestamp location-long location-lat tag-local-identifier ECMWF Interim Full Daily Invariant High Vegetation Cover year month hour time_diff(hrs)
0 2009-05-27 14:00:00 24.58617 61.24783 91732 0.960771 2009 5 14 0.0
1 2009-05-27 20:00:00 24.58217 61.23267 91732 0.959197 2009 5 20 6.0
2 2009-05-28 05:00:00 24.53133 61.18833 91732 0.947799 2009 5 5 9.0
3 2009-05-28 08:00:00 24.58200 61.23283 91732 0.959182 2009 5 8 3.0
4 2009-05-28 14:00:00 24.58250 61.23267 91732 0.959247 2009 5 14 6.0
5 2009-05-28 20:00:00 24.58617 61.24767 91732 0.960761 2009 5 20 6.0
6 2009-05-29 05:00:00 24.58600 61.24767 91732 0.960736 2009 5 5 9.0
7 2009-05-29 08:00:00 24.58617 61.24767 91732 0.960761 2009 5 8 3.0
8 2009-05-29 14:00:00 24.58650 61.24750 91732 0.960799 2009 5 14 6.0
9 2009-05-29 20:00:00 24.56967 61.23883 91732 0.957722 2009 5 20 6.0

Haversine Formula:¶

To calculate the distance between two latitude and longitude points (current and previous), you can use the Haversine formula. This formula calculates the distance between two points on the Earth's surface, taking into account the spherical shape of the Earth.

$$ a = \sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1) \cdot \cos(\phi_2) \cdot \sin^2\left(\frac{\Delta\lambda}{2}\right) $$

$$ c = 2 \cdot \text{atan2}\left(\sqrt{a}, \sqrt{1 - a}\right) $$

$$ d = R \cdot c $$

Where:

  • $ \phi_1, \phi_2 $ are the latitudes of the two points in radians,
  • $ \lambda_1, \lambda_2 $ are the longitudes of the two points in radians,
  • $ R $ is the Earth's radius (mean radius = 6,371 km),
  • $ d $ is the distance between the points in kilometers.
In [13]:
# -------------------- STEP 5: Define Haversine Distance Function -------------------- #
def haversine(lat1, lon1, lat2, lon2):
    """Compute the great-circle distance (Haversine formula) between two GPS coordinates."""
    R = 6371  # Earth radius in kilometers
    phi1, phi2 = map(math.radians, [lat1, lat2])
    delta_phi = math.radians(lat2 - lat1)
    delta_lambda = math.radians(lon2 - lon1)

    a = math.sin(delta_phi / 2)**2 + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

    return R * c  # Distance in km
In [14]:
# -------------------- STEP 6: Compute Distance per Bird -------------------- #
# Compute previous lat/lon per bird before applying Haversine formula
data["prev_lat"] = data.groupby("tag-local-identifier")["location-lat"].shift(1)
data["prev_lon"] = data.groupby("tag-local-identifier")["location-long"].shift(1)

# Apply Haversine function to compute distances
data["distance(km)"] = data.apply(
    lambda row: haversine(row["prev_lat"], row["prev_lon"], row["location-lat"], row["location-long"])
    if pd.notna(row["prev_lat"]) and pd.notna(row["prev_lon"]) else 0, axis=1
)

# Drop temporary columns
data.drop(columns=["prev_lat", "prev_lon"], inplace=True)
In [15]:
# ------------------------ STEP 7: Compute Speed (Avoid Division by Zero) ------------------------ #
data["speed(km/hr)"] = data["distance(km)"] / data["time_diff(hrs)"]

# Replace inf, -inf, NaN → First np.nan, then 0
data["speed(km/hr)"] = data["speed(km/hr)"].replace([np.inf, -np.inf], np.nan)
data["speed(km/hr)"] = data["speed(km/hr)"].fillna(0)  # Replace NaN with 0
In [16]:
!pip install folium
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: folium in /Users/apple/Library/Python/3.9/lib/python/site-packages (0.20.0)
Requirement already satisfied: branca>=0.6.0 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (0.8.1)
Requirement already satisfied: jinja2>=2.9 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (3.1.6)
Requirement already satisfied: numpy in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2.0.2)
Requirement already satisfied: requests in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2.32.4)
Requirement already satisfied: xyzservices in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2025.4.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from jinja2>=2.9->folium) (3.0.2)
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (2025.7.14)
In [17]:
import folium
from folium.plugins import AntPath
import pandas as pd
import numpy as np
import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from matplotlib import colormaps

# Ensure timestamp is in datetime format
data['timestamp'] = pd.to_datetime(data['timestamp'])

# Get unique years, months, and tags
unique_years = sorted(data['year'].unique())
unique_months = sorted(data['month'].unique())
unique_tags = sorted(data['tag-local-identifier'].unique())

# Set up color mapping for each unique tag
base_cmap = colormaps.get_cmap('tab10')
color_map = lambda i: base_cmap(i / max(len(unique_tags) - 1, 1))  # Normalize index
tag_colors = {
    tag: f"#{int(color_map(i)[0]*255):02x}{int(color_map(i)[1]*255):02x}{int(color_map(i)[2]*255):02x}"
    for i, tag in enumerate(unique_tags)
}

# Create dropdowns for year and month selection
year_selector = widgets.SelectMultiple(
    options=unique_years,
    value=[unique_years[0]],
    description='Years',
    layout=widgets.Layout(height='100px', width='150px')
)

month_selector = widgets.SelectMultiple(
    options=unique_months,
    value=[unique_months[0]],
    description='Months',
    layout=widgets.Layout(height='100px', width='150px')
)

# Button to update the map
update_button = widgets.Button(description="Update Map")

# Output widget to display the map
output = widgets.Output()

# Function to plot movement interactively
def plot_movement_interactive(years, months):
    # Filter data
    filtered_data = data[data["year"].isin(years) & data["month"].isin(months)]

    if filtered_data.empty:
        with output:
            clear_output(wait=True)
            print("No data available for the selected period.")
        return None

    # Initialize the map at the first valid location
    first_point = (
        filtered_data.iloc[0]["location-lat"],
        filtered_data.iloc[0]["location-long"]
    )
    m = folium.Map(location=first_point, zoom_start=8)

    # Plot paths for each unique bird tag
    for tag in filtered_data["tag-local-identifier"].unique():
        bird_data = filtered_data[filtered_data["tag-local-identifier"] == tag]
        bird_color = tag_colors[tag]
        path = list(zip(bird_data["location-lat"], bird_data["location-long"]))

        # Draw movement path
        folium.PolyLine(path, color=bird_color, weight=2.5, opacity=0.8).add_to(m)

        # Add point markers
        for _, row in bird_data.iterrows():
            folium.CircleMarker(
                location=(row["location-lat"], row["location-long"]),
                radius=5,
                color=bird_color,
                fill=True,
                fill_color=bird_color,
                popup=f"Tag: {tag}<br>Time: {row['timestamp']}"
            ).add_to(m)

    return m

# Button click event handler
def on_button_click(b):
    output.clear_output(wait=True)
    selected_years = [int(year) for year in year_selector.value]
    selected_months = [int(month) for month in month_selector.value]
    map_plot = plot_movement_interactive(selected_years, selected_months)

    if map_plot:
        with output:
            display(map_plot)

update_button.on_click(on_button_click)

# Display UI
display(widgets.HBox([year_selector, month_selector]))
display(update_button, output)
HBox(children=(SelectMultiple(description='Years', index=(0,), layout=Layout(height='100px', width='150px'), o…
Button(description='Update Map', style=ButtonStyle())
Output()
In [18]:
import folium
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output
from folium.plugins import TimestampedGeoJson
import matplotlib.pyplot as plt
from matplotlib import colormaps

# Ensure timestamp is in datetime format
data['timestamp'] = pd.to_datetime(data['timestamp'])

# Get unique tag-local-identifiers
unique_tags = sorted(data['tag-local-identifier'].unique())

# Assign unique colors to each tag using normalized colormap
base_cmap = colormaps.get_cmap('tab10')
color_map = lambda i: base_cmap(i / max(len(unique_tags) - 1, 1))
tag_colors = {
    tag: f"#{int(color_map(i)[0]*255):02x}{int(color_map(i)[1]*255):02x}{int(color_map(i)[2]*255):02x}"
    for i, tag in enumerate(unique_tags)
}

# Create dropdowns for tag, year, and month selection
tag_selector = widgets.Dropdown(
    options=unique_tags,
    value=unique_tags[0],
    description='Tag:',
    layout=widgets.Layout(width='200px')
)

year_selector = widgets.Dropdown(
    options=[],
    description='Year:',
    layout=widgets.Layout(width='200px')
)

month_selector = widgets.Dropdown(
    options=[],
    description='Month:',
    layout=widgets.Layout(width='200px')
)

update_button = widgets.Button(description="Update Map")
output = widgets.Output()

# Update year and month dropdowns dynamically based on tag selection
def update_year_month_dropdowns(tag):
    filtered_data = data[data["tag-local-identifier"] == tag]
    unique_years = sorted(filtered_data['year'].unique())
    unique_months = sorted(filtered_data['month'].unique())

    year_selector.options = unique_years
    month_selector.options = unique_months

    if unique_years:
        year_selector.value = unique_years[0]
    if unique_months:
        month_selector.value = unique_months[0]

# Movement plotting function with animation
def plot_movement_interactive(tag, year, month):
    filtered_data = data[
        (data["tag-local-identifier"] == tag) &
        (data["year"] == year) &
        (data["month"] == month)
    ]

    if filtered_data.empty:
        with output:
            clear_output(wait=True)
            print("No data available for the selected tag, year, and month.")
        return None

    filtered_data = filtered_data.sort_values(by="timestamp")
    bird_color = tag_colors[tag]

    first_point = (filtered_data.iloc[0]["location-lat"], filtered_data.iloc[0]["location-long"])
    m = folium.Map(location=first_point, zoom_start=8)

    features = []
    path_coordinates = []

    for _, row in filtered_data.iterrows():
        point_feature = {
            'type': 'Feature',
            'geometry': {
                'type': 'Point',
                'coordinates': [row["location-long"], row["location-lat"]]
            },
            'properties': {
                'time': row['timestamp'].isoformat(),
                'popup': f"Tag: {tag}<br>Time: {row['timestamp']}",
                'icon': 'circle',
                'iconstyle': {
                    'fillColor': bird_color,
                    'fillOpacity': 0.6,
                    'stroke': 'false',
                    'radius': 5
                }
            }
        }
        features.append(point_feature)
        path_coordinates.append([row["location-long"], row["location-lat"]])

    line_feature = {
        'type': 'Feature',
        'geometry': {
            'type': 'LineString',
            'coordinates': path_coordinates
        },
        'properties': {
            'times': [row['timestamp'].isoformat() for _, row in filtered_data.iterrows()],
            'style': {
                'color': bird_color,
                'weight': 2
            }
        }
    }
    features.append(line_feature)

    TimestampedGeoJson(
        {'type': 'FeatureCollection', 'features': features},
        period='PT1M',
        add_last_point=True,
        auto_play=True,
        loop=False,
        max_speed=30,
        loop_button=True,
        date_options='YYYY/MM/DD HH:mm:ss',
        time_slider_drag_update=True
    ).add_to(m)

    return m

# Button click logic
def on_button_click(b):
    output.clear_output(wait=True)
    selected_tag = tag_selector.value
    selected_year = year_selector.value
    selected_month = month_selector.value
    map_plot = plot_movement_interactive(selected_tag, selected_year, selected_month)
    if map_plot:
        with output:
            display(map_plot)

# Link dropdown updates to tag selection
def on_tag_change(change):
    update_year_month_dropdowns(change['new'])

tag_selector.observe(on_tag_change, names='value')
update_year_month_dropdowns(tag_selector.value)

update_button.on_click(on_button_click)

# Display all UI components
display(widgets.VBox([tag_selector, year_selector, month_selector]))
display(update_button, output)
VBox(children=(Dropdown(description='Tag:', layout=Layout(width='200px'), options=(np.int64(91732), np.int64(9…
Button(description='Update Map', style=ButtonStyle())
Output()
In [19]:
# -------------------- STEP 8: Compute Bearing (Direction of Movement) -------------------- #

# Function to calculate bearing between two GPS points
def calculate_bearing(lat1, lon1, lat2, lon2):
    """
    Calculate the initial bearing (direction) from point (lat1, lon1) to (lat2, lon2).
    The result is in degrees (0° = North, 90° = East, 180° = South, 270° = West).
    """
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])

    delta_lon = lon2 - lon1
    x = np.sin(delta_lon) * np.cos(lat2)
    y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(delta_lon)

    initial_bearing = np.arctan2(x, y)
    initial_bearing = np.degrees(initial_bearing)

    return (initial_bearing + 360) % 360  # Normalize to 0-360 degrees

# Initialize a new bearing column
data["bearing"] = np.nan  # Start with NaN for all rows

# Compute bearing for each bird (tag) individually
for tag in data["tag-local-identifier"].unique():
    tag_data = data[data["tag-local-identifier"] == tag].copy()
    tag_data.sort_values("timestamp", inplace=True)

    # Shifted coordinates to get previous point
    lat1 = tag_data["location-lat"].shift(1)
    lon1 = tag_data["location-long"].shift(1)
    lat2 = tag_data["location-lat"]
    lon2 = tag_data["location-long"]

    # Compute bearing
    bearings = calculate_bearing(lat1, lon1, lat2, lon2)

    # Fill NaN with 0 and assign to main DataFrame
    data.loc[tag_data.index, "bearing"] = bearings.fillna(0)
In [20]:
# -------------------- STEP 9: Encode Cyclic Time Features (Preserve Temporal Patterns) -------------------- #
# Convert hour to cyclic feature
data["hour_sin"] = np.sin(2 * np.pi * data["hour"] / 24)
data["hour_cos"] = np.cos(2 * np.pi * data["hour"] / 24)

# Convert month to cyclic feature
data["month_sin"] = np.sin(2 * np.pi * data["month"] / 12)
data["month_cos"] = np.cos(2 * np.pi * data["month"] / 12)

# Drop original columns
data.drop(["hour", "month"], axis=1, inplace=True)
In [21]:
# -------------------- STEP 8: Reorder Columns -------------------- #
desired_order = ["tag-local-identifier", "timestamp", "year", "month_sin", "month_cos", "hour_sin","hour_cos", "time_diff(hrs)", "distance(km)", "speed(km/hr)",
                 "ECMWF Interim Full Daily Invariant High Vegetation Cover", "bearing",
                "location-long", "location-lat"
                ]
data = data[desired_order]
In [22]:
data.head()
Out[22]:
tag-local-identifier timestamp year month_sin month_cos hour_sin hour_cos time_diff(hrs) distance(km) speed(km/hr) ECMWF Interim Full Daily Invariant High Vegetation Cover bearing location-long location-lat
0 91732 2009-05-27 14:00:00 2009 0.5 -0.866025 -0.500000 -0.866025 0.0 0.000000 0.000000 0.960771 0.000000 24.58617 61.24783
1 91732 2009-05-27 20:00:00 2009 0.5 -0.866025 -0.866025 0.500000 6.0 1.699244 0.283207 0.959197 187.236713 24.58217 61.23267
2 91732 2009-05-28 05:00:00 2009 0.5 -0.866025 0.965926 0.258819 9.0 5.632120 0.625791 0.947799 208.929407 24.53133 61.18833
3 91732 2009-05-28 08:00:00 2009 0.5 -0.866025 0.866025 -0.500000 3.0 5.643315 1.881105 0.959182 28.716637 24.58200 61.23283
4 91732 2009-05-28 14:00:00 2009 0.5 -0.866025 -0.500000 -0.866025 6.0 0.032131 0.005355 0.959247 123.620959 24.58250 61.23267
In [23]:
data.drop(columns=['year', 'time_diff(hrs)', 'ECMWF Interim Full Daily Invariant High Vegetation Cover'], axis=1, inplace=True)
In [24]:
data.head()
Out[24]:
tag-local-identifier timestamp month_sin month_cos hour_sin hour_cos distance(km) speed(km/hr) bearing location-long location-lat
0 91732 2009-05-27 14:00:00 0.5 -0.866025 -0.500000 -0.866025 0.000000 0.000000 0.000000 24.58617 61.24783
1 91732 2009-05-27 20:00:00 0.5 -0.866025 -0.866025 0.500000 1.699244 0.283207 187.236713 24.58217 61.23267
2 91732 2009-05-28 05:00:00 0.5 -0.866025 0.965926 0.258819 5.632120 0.625791 208.929407 24.53133 61.18833
3 91732 2009-05-28 08:00:00 0.5 -0.866025 0.866025 -0.500000 5.643315 1.881105 28.716637 24.58200 61.23283
4 91732 2009-05-28 14:00:00 0.5 -0.866025 -0.500000 -0.866025 0.032131 0.005355 123.620959 24.58250 61.23267
In [25]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Calculate speed for each bird using Haversine formula
data['speed'] = 0.0
data['distance'] = 0.0

for tag in data["tag-local-identifier"].unique():
    tag_data = data[data["tag-local-identifier"] == tag].copy()

    lat1 = np.radians(tag_data["location-lat"].shift(1))
    lon1 = np.radians(tag_data["location-long"].shift(1))
    lat2 = np.radians(tag_data["location-lat"])
    lon2 = np.radians(tag_data["location-long"])

    dlat = lat2 - lat1
    dlon = lon2 - lon1

    a = np.sin(dlat / 2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2)**2
    c = 2 * np.arcsin(np.sqrt(a))
    distance = 6371 * c

    time_diff = tag_data["timestamp"].diff().dt.total_seconds() / 3600
    speed = distance / time_diff
    speed = speed.fillna(0)

    data.loc[tag_data.index, "speed"] = speed
    data.loc[tag_data.index, "distance"] = distance.fillna(0)

# Select bird
bird_tag = 91732
bird_data = data[data["tag-local-identifier"] == bird_tag]

# Replace NaNs
bird_data = data[data["tag-local-identifier"] == bird_tag].copy()
bird_data.loc[:, "bearing"] = bird_data["bearing"].fillna(0)


# Plotting
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle(f"Visualizations for Bird Tag: {bird_tag}", fontsize=16)

# Top row
axes[0, 0].hist(bird_data["speed"], bins=50, color='skyblue')
axes[0, 0].set_title("Speed Distribution")
axes[0, 0].set_xlabel("Speed (km/hr)")

axes[0, 1].scatter(bird_data["location-long"], bird_data["location-lat"], c='blue', s=10)
axes[0, 1].set_title("Path (Lat vs Long)")
axes[0, 1].set_xlabel("Longitude")
axes[0, 1].set_ylabel("Latitude")

axes[0, 2].boxplot(
    [bird_data["speed"], bird_data["distance"], bird_data["bearing"]],
    tick_labels=["Speed", "Distance", "Bearing"]
)

axes[0, 2].set_title("Outlier Detection (Boxplots)")

# Bottom row — NEW PLOTS
# Time Series: Speed
axes[1, 0].plot(bird_data["timestamp"], bird_data["speed"], color='green')
axes[1, 0].set_title("Speed over Time")
axes[1, 0].set_xlabel("Time")
axes[1, 0].tick_params(axis='x', rotation=45)

# Scatter: Distance vs Speed
axes[1, 1].scatter(bird_data["distance"], bird_data["speed"], alpha=0.5, color='purple')
axes[1, 1].set_title("Distance vs Speed")
axes[1, 1].set_xlabel("Distance (km)")
axes[1, 1].set_ylabel("Speed (km/hr)")

# Histogram: Bearings
axes[1, 2].hist(bird_data["bearing"], bins=30, color='orange')
axes[1, 2].set_title("Bearing Distribution")
axes[1, 2].set_xlabel("Bearing (°)")

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
No description has been provided for this image
In [26]:
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display

# Function to calculate time difference and plot graphs
def plot_time_gap_analysis(tag_id):
    """
    Analyzes the relationship between time intervals and speed/distance.

    Parameters:
    tag_id (int or str): The unique identifier of the bird.
    """
    data_tag = data[data['tag-local-identifier'] == tag_id].copy()
    data_tag['timestamp'] = pd.to_datetime(data_tag['timestamp'])
    data_tag = data_tag.sort_values(by='timestamp')

    # Compute time difference in hours
    data_tag['time_diff'] = data_tag['timestamp'].diff().dt.total_seconds() / 3600

    if data_tag.empty:
        print(f"No data available for tag {tag_id}")
        return

    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    fig.suptitle(f'Time Interval Analysis for Bird {tag_id}', fontsize=14)

    # Histogram of time intervals
    axes[0].hist(data_tag['time_diff'].dropna(), bins=30, edgecolor='black')
    axes[0].set_title('Time Interval Distribution')
    axes[0].set_xlabel('Time Interval (hours)')
    axes[0].set_ylabel('Frequency')

    # Scatter plot of distance vs. time interval
    axes[1].scatter(data_tag['time_diff'], data_tag['distance(km)'], alpha=0.5)
    axes[1].set_title('Distance vs. Time Interval')
    axes[1].set_xlabel('Time Interval (hours)')
    axes[1].set_ylabel('Distance (km)')

    # Scatter plot of speed vs. time interval
    axes[2].scatter(data_tag['time_diff'], data_tag['speed(km/hr)'], alpha=0.5)
    axes[2].set_title('Speed vs. Time Interval')
    axes[2].set_xlabel('Time Interval (hours)')
    axes[2].set_ylabel('Speed (km/hr)')

    plt.show()

# Create dropdown widget to select bird tag
tag_selector = widgets.Dropdown(
    options=data['tag-local-identifier'].unique(),
    description='Select Tag:',
    style={'description_width': 'initial'}
)

# Display dropdown and link it to the function
# display(tag_selector)
widgets.interactive(plot_time_gap_analysis, tag_id=tag_selector)
Out[26]:
interactive(children=(Dropdown(description='Select Tag:', options=(np.int64(91732), np.int64(91733), np.int64(…
In [27]:
import numpy as np
import pandas as pd
import folium
from folium.plugins import MarkerCluster
from sklearn.cluster import DBSCAN

# Assuming df has columns: ['timestamp', 'location-lat', 'location-long', 'speed(km/hr)']
resting_threshold = 0.025  # km/hr
resting_points = data[data['speed(km/hr)'] <= resting_threshold].copy()

# Clustering with DBSCAN
epsilon = 0.1  # Adjust based on typical stopover site size
min_samples = 10  # Minimum points to form a cluster
db = DBSCAN(eps=epsilon, min_samples=min_samples, metric='haversine').fit(np.radians(resting_points[['location-lat', 'location-long']]))

# Assign cluster labels
resting_points['cluster'] = db.labels_

# Create a folium map centered at the first resting point
center_lat, center_long = resting_points.iloc[0][['location-lat', 'location-long']]
m = folium.Map(location=[center_lat, center_long], zoom_start=8)

# Color mapping for clusters
colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'lightblue', 'pink', 'black', 'gray']
marker_cluster = MarkerCluster().add_to(m)

# Plot resting points with cluster colors
for _, row in resting_points.iterrows():
    cluster = row['cluster']
    color = colors[cluster % len(colors)] if cluster != -1 else "black"  # Noise in black
    folium.CircleMarker(
        location=[row['location-lat'], row['location-long']],
        radius=4,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.7,
        popup=f"Cluster: {cluster}"
    ).add_to(marker_cluster)

# Show the map
m
Out[27]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [28]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping



X = data.drop(columns=['timestamp', 'location-long', 'location-lat']).to_numpy()
y = data[['location-long', 'location-lat']].to_numpy()
tags = data['tag-local-identifier'].to_numpy()


unique_tags = np.unique(tags)


def create_sequences(tag_data, tag_labels, seq_length):
    X_seq, y_seq = [], []
    for i in range(len(tag_data) - seq_length):  
        X_seq.append(tag_data[i:i + seq_length])
        y_seq.append(tag_labels[i + seq_length]) 
    return np.array(X_seq), np.array(y_seq)


X_sequences, y_sequences = [], []
sequence_length = 10  


for tag in unique_tags:
    tag_indices = np.where(tags == tag)[0]  # Get indices for the tag
    tag_data = X[tag_indices]
    tag_labels = y[tag_indices]

    if len(tag_data) > sequence_length:
        X_seq, y_seq = create_sequences(tag_data, tag_labels, sequence_length)
        X_sequences.append(X_seq)
        y_sequences.append(y_seq)


X_sequences = np.vstack(X_sequences)
y_sequences = np.vstack(y_sequences)

#train and testing validation split (80-10-10)
X_train, X_temp, y_train, y_temp = train_test_split(X_sequences, y_sequences, test_size=0.2, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)


scaler_X = StandardScaler()
X_train = scaler_X.fit_transform(X_train.reshape(-1, X_train.shape[2])).reshape(X_train.shape)
X_val = scaler_X.transform(X_val.reshape(-1, X_val.shape[2])).reshape(X_val.shape)
X_test = scaler_X.transform(X_test.reshape(-1, X_test.shape[2])).reshape(X_test.shape)


scaler_y = StandardScaler()
y_train = scaler_y.fit_transform(y_train)
y_val = scaler_y.transform(y_val)
y_test = scaler_y.transform(y_test)


num_features = X_train.shape[2] 

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, GRU, Dense

model = Sequential([
    Input(shape=(sequence_length, num_features)),  
    GRU(64, return_sequences=True),
    GRU(32, return_sequences=False),
    Dense(16, activation='relu'),
    Dense(2)
])



model.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['mae'])


history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100,
    batch_size=32,
    callbacks=[EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)]
)


model.save("my_model.keras")



plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Training History')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Epoch 1/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 8s 3ms/step - loss: 0.4588 - mae: 0.4950 - val_loss: 0.2316 - val_mae: 0.3311
Epoch 2/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.2170 - mae: 0.3198 - val_loss: 0.1773 - val_mae: 0.2852
Epoch 3/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.1739 - mae: 0.2770 - val_loss: 0.1543 - val_mae: 0.2558
Epoch 4/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.1350 - mae: 0.2381 - val_loss: 0.1219 - val_mae: 0.2231
Epoch 5/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.1141 - mae: 0.2155 - val_loss: 0.1018 - val_mae: 0.2022
Epoch 6/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0982 - mae: 0.1960 - val_loss: 0.0916 - val_mae: 0.1869
Epoch 7/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0881 - mae: 0.1836 - val_loss: 0.0863 - val_mae: 0.1780
Epoch 8/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0784 - mae: 0.1710 - val_loss: 0.0736 - val_mae: 0.1629
Epoch 9/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0685 - mae: 0.1575 - val_loss: 0.0683 - val_mae: 0.1579
Epoch 10/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0642 - mae: 0.1508 - val_loss: 0.0661 - val_mae: 0.1482
Epoch 11/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0593 - mae: 0.1442 - val_loss: 0.0730 - val_mae: 0.1526
Epoch 12/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0544 - mae: 0.1372 - val_loss: 0.0575 - val_mae: 0.1419
Epoch 13/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0501 - mae: 0.1320 - val_loss: 0.0577 - val_mae: 0.1389
Epoch 14/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0470 - mae: 0.1282 - val_loss: 0.0504 - val_mae: 0.1320
Epoch 15/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0452 - mae: 0.1241 - val_loss: 0.0518 - val_mae: 0.1289
Epoch 16/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0422 - mae: 0.1200 - val_loss: 0.0527 - val_mae: 0.1291
Epoch 17/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0395 - mae: 0.1162 - val_loss: 0.0451 - val_mae: 0.1255
Epoch 18/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0371 - mae: 0.1139 - val_loss: 0.0481 - val_mae: 0.1262
Epoch 19/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0355 - mae: 0.1116 - val_loss: 0.0646 - val_mae: 0.1355
Epoch 20/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0345 - mae: 0.1093 - val_loss: 0.0417 - val_mae: 0.1177
Epoch 21/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0324 - mae: 0.1062 - val_loss: 0.0405 - val_mae: 0.1150
Epoch 22/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0336 - mae: 0.1069 - val_loss: 0.0386 - val_mae: 0.1114
Epoch 23/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0305 - mae: 0.1022 - val_loss: 0.0400 - val_mae: 0.1148
Epoch 24/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0291 - mae: 0.1015 - val_loss: 0.0404 - val_mae: 0.1167
Epoch 25/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0270 - mae: 0.0980 - val_loss: 0.0479 - val_mae: 0.1223
Epoch 26/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0267 - mae: 0.0972 - val_loss: 0.0376 - val_mae: 0.1089
Epoch 27/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0258 - mae: 0.0951 - val_loss: 0.0349 - val_mae: 0.1062
Epoch 28/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0252 - mae: 0.0942 - val_loss: 0.0373 - val_mae: 0.1081
Epoch 29/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0235 - mae: 0.0916 - val_loss: 0.0359 - val_mae: 0.1089
Epoch 30/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0231 - mae: 0.0913 - val_loss: 0.0410 - val_mae: 0.1106
Epoch 31/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0225 - mae: 0.0893 - val_loss: 0.0319 - val_mae: 0.1043
Epoch 32/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0255 - mae: 0.0927 - val_loss: 0.0309 - val_mae: 0.1008
Epoch 33/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0236 - mae: 0.0896 - val_loss: 0.0298 - val_mae: 0.0999
Epoch 34/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0206 - mae: 0.0863 - val_loss: 0.0317 - val_mae: 0.1015
Epoch 35/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0197 - mae: 0.0850 - val_loss: 0.0339 - val_mae: 0.1075
Epoch 36/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0205 - mae: 0.0859 - val_loss: 0.0453 - val_mae: 0.1149
Epoch 37/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0196 - mae: 0.0843 - val_loss: 0.0361 - val_mae: 0.1074
Epoch 38/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0217 - mae: 0.0873 - val_loss: 0.0278 - val_mae: 0.0943
Epoch 39/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0191 - mae: 0.0823 - val_loss: 0.0268 - val_mae: 0.0936
Epoch 40/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0194 - mae: 0.0831 - val_loss: 0.0274 - val_mae: 0.0941
Epoch 41/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0175 - mae: 0.0798 - val_loss: 0.0281 - val_mae: 0.0940
Epoch 42/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 8s 4ms/step - loss: 0.0193 - mae: 0.0820 - val_loss: 0.0294 - val_mae: 0.0969
Epoch 43/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 8s 3ms/step - loss: 0.0167 - mae: 0.0784 - val_loss: 0.0282 - val_mae: 0.0947
Epoch 44/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 8s 3ms/step - loss: 0.0195 - mae: 0.0822 - val_loss: 0.0301 - val_mae: 0.0957
Epoch 45/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0174 - mae: 0.0786 - val_loss: 0.0284 - val_mae: 0.0958
Epoch 46/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0177 - mae: 0.0785 - val_loss: 0.0276 - val_mae: 0.0919
Epoch 47/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 8s 4ms/step - loss: 0.0151 - mae: 0.0753 - val_loss: 0.0303 - val_mae: 0.0963
Epoch 48/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 8s 4ms/step - loss: 0.0160 - mae: 0.0765 - val_loss: 0.0288 - val_mae: 0.0964
Epoch 49/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 8s 4ms/step - loss: 0.0160 - mae: 0.0765 - val_loss: 0.0298 - val_mae: 0.0966
No description has been provided for this image
In [29]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.models import load_model
from tensorflow.keras.losses import MeanSquaredError

# Load the trained model from the 'models' directory
model_path = 'models/next_lat_long_model.h5'
model = load_model(model_path, compile=False)

# Compile the model (useful for evaluation or further training)
model.compile(loss=MeanSquaredError(), optimizer='adam')

# Predict on the scaled test dataset
y_pred_scaled = model.predict(X_test)

# Inverse transform to get actual lat/long values
y_pred = scaler_y.inverse_transform(y_pred_scaled)
y_true = scaler_y.inverse_transform(y_test)

# Compute performance metrics
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))

print(f"Mean Absolute Error (MAE): {mae}")
print(f"Root Mean Squared Error (RMSE): {rmse}")

# Create a DataFrame to compare true vs predicted values
results_df = pd.DataFrame({
    'True_Longitude': y_true[:, 0],
    'True_Latitude': y_true[:, 1],
    'Predicted_Longitude': y_pred[:, 0],
    'Predicted_Latitude': y_pred[:, 1]
})

# Calculate individual and combined absolute errors
results_df['Longitude_Error'] = results_df['True_Longitude'] - results_df['Predicted_Longitude']
results_df['Latitude_Error'] = results_df['True_Latitude'] - results_df['Predicted_Latitude']
results_df['Absolute_Error'] = np.sqrt(
    results_df['Longitude_Error']**2 + results_df['Latitude_Error']**2
)

# Export the results to a CSV file
results_df.to_csv('test_predictions_with_errors.csv', index=False)

# Preview the first few rows
print("\nTest Predictions with Errors:")
print(results_df.head())
277/277 ━━━━━━━━━━━━━━━━━━━━ 0s 948us/step
Mean Absolute Error (MAE): 1.2607183889833222
Root Mean Squared Error (RMSE): 2.5749299248374697

Test Predictions with Errors:
   True_Longitude  True_Latitude  Predicted_Longitude  Predicted_Latitude  \
0        35.02833       36.69917            33.473579           33.341965   
1        18.72817       54.68117            17.806458           50.737865   
2        43.10750       11.54833            42.125614           17.425890   
3        36.68800       65.00750            35.852833           64.844254   
4        32.79250       -1.80633            32.766445           -2.858345   

   Longitude_Error  Latitude_Error  Absolute_Error  
0         1.554751        3.357205        3.699740  
1         0.921712        3.943305        4.049593  
2         0.981886       -5.877560        5.959011  
3         0.835167        0.163246        0.850972  
4         0.026055        1.052015        1.052337  
In [30]:
import folium
from folium import PolyLine, CircleMarker
import pandas as pd

# Load the predictions from the CSV in the datasets/ directory
results_df = pd.read_csv('dataset/test_predictions_with_errors_2.csv')

# Sample the first N points for clarity in visualization
N = 20
sample_df = results_df.head(N)

# Initialize the map centered at the starting actual location
start_coords = [sample_df['True_Latitude'].iloc[0], sample_df['True_Longitude'].iloc[0]]
m = folium.Map(location=start_coords, zoom_start=5)

# Plot actual path (in blue)
actual_path = list(zip(sample_df['True_Latitude'], sample_df['True_Longitude']))
PolyLine(actual_path, color='blue', weight=4, opacity=0.8, tooltip="Actual Path").add_to(m)

# Plot predicted path (in red)
predicted_path = list(zip(sample_df['Predicted_Latitude'], sample_df['Predicted_Longitude']))
PolyLine(predicted_path, color='red', weight=4, opacity=0.8, tooltip="Predicted Path").add_to(m)

# Draw error lines and point markers
for _, row in sample_df.iterrows():
    actual_point = (row['True_Latitude'], row['True_Longitude'])
    predicted_point = (row['Predicted_Latitude'], row['Predicted_Longitude'])

    # Line connecting actual to predicted
    PolyLine([actual_point, predicted_point], color='gray', weight=1, opacity=0.5).add_to(m)

    # Markers
    CircleMarker(actual_point, radius=3, color='blue', fill=True, fill_color='blue').add_to(m)
    CircleMarker(predicted_point, radius=3, color='red', fill=True, fill_color='red').add_to(m)

# Show the map
m
Out[30]:
Make this Notebook Trusted to load map: File -> Trust Notebook

✅ Conclusion¶

This project successfully demonstrated a complete end-to-end pipeline for processing, modeling, and predicting wildlife movement data using deep learning. We utilized real-world geospatial datasets and implemented a GRU-based recurrent neural network to model and forecast trajectories.

🚀 Future Scope¶

  • Integrate real-time animal movement data using APIs (e.g., Movebank)
  • Deploy the model as a Streamlit or Flask web app for conservationists
  • Expand to multi-species, multi-continent trajectory analysis
  • Add reinforcement learning to optimize migratory route prediction
  • Collaborate with wildlife reserves for real-world deployment

📌 Author: Aarjun Mahule
📅 Date: May 2025
📍 Nagpur, India